AITopics | tts service

Collaborating Authors

tts service

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control

Chen, Haozhe, Chen, Run, Hirschberg, Julia

arXiv.org Artificial IntelligenceSep-30-2024

While recent advances in Text-to-Speech (TTS) technology produce natural and expressive speech, they lack the option for users to select emotion and control intensity. We propose EmoKnob, a framework that allows fine-grained emotion control in speech synthesis with few-shot demonstrative samples of arbitrary emotion. Our framework leverages the expressive speaker representation space made possible by recent advances in foundation voice cloning models. Based on the few-shot capability of our emotion control framework, we propose two methods to apply emotion control on emotions described by open-ended text, enabling an intuitive interface for controlling a diverse array of nuanced emotions. To facilitate a more systematic emotional speech synthesis field, we introduce a set of evaluation metrics designed to rigorously assess the faithfulness and recognizability of emotion control frameworks. Through objective and subjective evaluations, we show that our emotion control framework effectively embeds emotions into speech and surpasses emotion expressiveness of commercial TTS services.

emotion, emotion control, speech, (16 more...)

arXiv.org Artificial Intelligence

2410.00316

Country:

North America > United States (0.46)
Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Digital Einstein Experience: Fast Text-to-Speech for Conversational AI

Rownicka, Joanna, Sprenkamp, Kilian, Tripiana, Antonio, Gromoglasov, Volodymyr, Kunz, Timo P

arXiv.org Artificial IntelligenceJul-21-2021

We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.

digital einstein experience, text-to-speech, tts service, (13 more...)

arXiv.org Artificial Intelligence

2107.10658

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.44)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.36)

Add feedback

A Voice Controlled E-Commerce Web Application

Kandhari, Mandeep Singh, Zulkernine, Farhana, Isah, Haruna

arXiv.org Machine LearningNov-15-2018

Abstract-- Automatic voice-controlled systems have changed the way humans interact with a computer. Voice or speech recognition systems allow a user to make a hands-free request to the computer, which in turn processes the request and serves the user with appropriate responses. After years of research and developments in machine learning and artificial intelligence, today voice-controlled technologies have become more efficient and are widely applied in many domains to enable and improve human-tohuman andhuman-to-computer interactions. The state-of-the-art e-commerce applications with the help of web technologies offer interactive and user-friendly interfaces. However, there are some instances where people, especially with visual disabilities, are not able to fully experience the serviceability of such applications. A voice-controlled system embedded in a web application can enhance user experience and can provide voice as a means to control the functionality of e-commerce websites. In this paper, we propose a taxonomy of speech recognition systems (SRS) and present a voice-controlled commodity purchase e-commerce application using IBM Watson speech-to-text to demonstrate its usability. The prototype can be extended to other application scenarios such as government service kiosks and enable analytics of the converted text data for scenarios such as medical diagnosis at the clinics. I. INTRODUCTION Voice recognition is used interchangeably with speech recognition, however, voice recognition is primarily the task of determining the identity of a speaker rather than the content of the speaker's speech [1].

application, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1811.09688

Country:

North America (0.29)
Europe > Germany (0.28)

Genre:

Research Report (1.00)
Overview (0.68)

Industry:

Information Technology > Services > e-Commerce Services (1.00)
Education > Educational Setting (0.93)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback